information management
How to Opt Out of Google Search's New AI Data Training Feature
Google's Search history update stores media uploads from your interactions, like images used in reverse image searches, for training its AI models. A little piece of my soul shrivels up every time I get a message laying out how another company plans to use personal data in ever encroaching ways for AI training . I got one of those emails recently from Google, with the subject line: "New privacy settings for Search services." It's part of Google's global rollout happening over the next few months that will change how it handles users' Search history data. Every piece of media, from photos you upload for reverse image searches to audio of you speaking with Google Translate, may be retained in your account and used to improve Google's AI models.
7813e19a86fd73d40f7e811ab15f6d5f-Paper-Datasets_and_Benchmarks_Track.pdf
Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e.g., to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e.g., to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16,972 traffic nodes) from year 2022 to 2024: our TraffiDent dataset includes traffic, i.e., time-series indexes on traffic flow, lane occupancy, and average vehicle speed, and incident, whose records are spatiotemporally aligned with traffic data, with seven different incident classes. Additionally, each node includes detailed physical and policylevel meta-attributes of lanes. Previous datasets typically contain only traffic or incident data in isolation, limiting research to general forecasting tasks.
Masayoshi Son dismisses Musk's idea for orbital data centers
Masayoshi Son dismisses Musk's idea for orbital data centers SoftBank founder Masayoshi Son dismissed the idea of space-based data centers, arguing that the AI race will be won by computing power on Earth. SoftBank Group founder Masayoshi Son said there's little merit to building data centers in space, as championed by Elon Musk, predicting that the artificial intelligence race will be clinched by computing power on Earth. The main advantage of building data centers in space would be to slash electricity costs. But such expenses comprise a small fraction of the cost of operating data centers, compared with hardware like chips, Son said during an annual shareholder meeting for SoftBank's mobile unit on Tuesday. The tradeoff for any power cost reductions would also include higher fees to transport everything into space, maintenance and communication delays, he added.
AGeneralized Binary Tree Mechanism for Private Approximation of All-Pair Shortest Distances
We study the problem of approximating all-pair distances in a weighted undirected graph with differential privacy, introduced by Sealfon [Sea16]. Given a publicly known undirected graph, we treat the weights of edges as sensitive information, and two graphs are neighbors if their edge weights differ in one edge by at most one. We obtain efficient algorithms with significantly improved bounds on a broad class of graphs which we refer to as recursively separable. In particular, for any n-vertex Kh-minor-free graph, our algorithm achieve an additive error of eO(h(nW)1/3), where W represents the maximum edge weight; For grid graphs, the same algorithmic scheme achieve additive error of eO(n1/4 W). Our approach can be seen as a generalization of the celebrated binary tree mechanism for range queries, as releasing range queries is equivalent to computing all-pair distances on a path graph. In essence, our approach is based on generalizing the binary tree mechanism to graphs that are recursively separable. JL and ZZ have been supported by National Science Foundation of China under Grant No. 62472212 and the New Cornerstone Science Foundation. Supported in part by NSF award 2228995 JU's research was funded by the NSFCNS 2433628, Google Seed Fund grant, Google Research Scholar Award, Dean Research Seed Fund, and Rutgers Decanal Grant no.
Thinking vs. Doing: Improving Agent Reasoning by Scaling Test-Time Interaction
The current paradigm of test-time scaling relies on generating long reasoning traces ("thinking" more) before producing a response. In agent problems that require interaction, this can be done by generating thinking traces before acting in the world. However, this process does not allow agents to acquire new information from the environment or adapt their behavior over time. In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent's interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout. To demonstrate the promise of this scaling dimension, we study the domain of web agents.
OCN: Effectively Utilizing Higher-Order Common Neighbors for Better Link Prediction
Common Neighbors (CNs) and their higher-order variants are important pairwise features widely used in state-of-the-art link prediction methods. However, existing methods often struggle with the repetition across different orders of CNs and fail to fully leverage their potential. We identify that these limitations stem from two key issues: redundancy and over-smoothing in high-order common neighbors. To address these challenges, we design orthogonalization to eliminate redundancy between different-order CNs and normalization to mitigate over-smoothing. By combining these two techniques, we propose Orthogonal Common Neighbor (OCN), a novel approach that significantly outperforms the strongest baselines by an average of 7.7% on popular link prediction benchmarks. A thorough theoretical analysis is provided to support our method. Ablation studies also verify the effectiveness of our orthogonalization and normalization techniques. Code is available at: https://github.com/qingpingmo/OCN
CPRet: ADataset, Benchmark, and Model for Retrieval in Competitive Programming
Competive programming benchmarks are widely used in scenarios such as programming contests and large language model assessments. However, the growing presence of duplicate or highly similar problems raises concerns not only about competition fairness, but also about the validity of competitive programming as a benchmark for model evaluation. In this paper, we propose a new problem--similar question retrieval--to tackle the problem. Due to the lack of both data and models, solving this problem is challenging. To this end, we introduce CPRet, a retrievaloriented benchmark suite for competitive programming, covering four retrieval tasks: two code-centric (i.e., Text-to-Code, Code-to-Code) and two newly proposed problem-centric tasks (i.e., Problem-to-Duplicate, Simplified-to-Full)--built from a combination of automatically crawled problem-solution data and manually curated annotations. Our contribution includes both high-quality training data and temporally separated test sets for reliable evaluation. Besides, we further develop two task-specialized retrievers based on this dataset: CPRetriever-Code, trained with a novel Group-InfoNCE loss for problem-code alignment, and CPRetriever-Prob, fine-tuned for indentifying problem-level similarity. Both models achieve strong results and are open-sourced for local use. Finally, we analyze LiveCodeBench and find that high-similarity problems inflate model pass rates and reduce differentiation, underscoring the need for similarity-aware evaluation in future benchmarks.
4KAgent: Agentic Any Image to 4KSuper-Resolution
We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). Our system can transform images from extremely low resolutions with severe degradations, for example, highly distorted inputs at 256 256, into crystal-clear, photorealistic 4K outputs.
SMARTraj2: AStable Multi-City Adaptive Method for Multi-View Spatio-Temporal Trajectory Representation Learning
Spatio-temporal trajectory representation learning plays a crucial role in various urban applications such as transportation systems, urban planning, and environmental monitoring. Existing methods can be divided into single-view and multi-view approaches, with the latter offering richer representations by integrating multiple sources of spatio-temporal data. However, these methods often struggle to generalize across diverse urban scenes due to multi-city structural heterogeneity, which arises from the disparities in road networks, grid layouts, and traffic regulations across cities, and the amplified seesaw phenomenon, where optimizing for one city, view, or task can degrade performance in others. These challenges hinder the deployment of trajectory learning models across multiple cities, limiting their realworld applicability. In this work, we propose SMARTraj2, a novel stable multi-city adaptive method for multi-view spatio-temporal trajectory representation learning. Specifically, we introduce a feature disentanglement module to separate domaininvariant and domain-specific features, and a personalized gating mechanism to dynamically stabilize the contributions of different views and tasks. Our approach achieves superior generalization across heterogeneous urban scenes while maintaining robust performance across multiple downstream tasks. Extensive experiments on benchmark datasets demonstrate the effectiveness of SMARTraj2 in enhancing cross-city generalization and outperforming state-of-the-art methods.